[Storage] Refactor storage operations with functors #8026

zhangchiqing · 2025-10-09T17:41:21Z

Working towards #7910

⚠️ This PR is not a full refactor of storage operations using the functor pattern; rather, it’s intended to start a discussion around the pattern itself. Please refer to the comments for specific discussion points.

codecov-commenter · 2025-10-09T17:45:23Z

Codecov Report

❌ Patch coverage is 39.05325% with 103 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
storage/operation/functor.go	40.51%	62 Missing and 7 partials ⚠️
storage/operation/guarantees.go	0.00%	21 Missing ⚠️
storage/operation/approvals.go	0.00%	8 Missing ⚠️
storage/operation/payload.go	0.00%	5 Missing ⚠️

📢 Thoughts on this report? Let us know!

zhangchiqing · 2025-10-09T17:46:39Z

state/protocol/badger/mutator.go

 	}
 	deferredBlockPersist.AddNextOperation(func(lctx lockctx.Proof, blockID flow.Identifier, rw storage.ReaderBatchWriter) error {
-		return operation.IndexLatestSealAtBlock(lctx, rw.Writer(), blockID, latestSeal.ID())
+		return operation.IndexingLatestSealAtBlock(blockID, latestSeal.ID())(lctx, rw)


We mentioned that any database operation requiring a lock context could be refactored using the functor pattern, but I think this case might be an exception—even after applying the refactor.

The functor isn’t particularly useful here since we don’t have the block ID until we start executing the deferred database operations.

I went ahead and refactored it anyway to illustrate my point, but in this case, it doesn’t provide any performance benefits over the original version and only adds unnecessary complexity.

Thoughts?

zhangchiqing · 2025-10-09T19:53:57Z

storage/store/payloads.go

 	}

 	// make sure all payload guarantees are stored
-	for _, guarantee := range payload.Guarantees {


Instead of storing each individual guarantee, we pass in all the guarantees to be stored together, so that instead of creating N functors, we create only one functor to insert and index N guarantees.

zhangchiqing · 2025-10-09T19:55:35Z

storage/store/payloads.go

+		return fmt.Errorf("could not store guarantees: %w", err)
 	}

 	// make sure all payload seals are stored


In this PR, I'm using guarantees as an example to apply the functor pattern and raise the discussion. I'd like to get feedbacks first and settle on the functor pattern before applying this to the storing operation of the rest of payloads , such as seals, results etc.

zhangchiqing · 2025-10-09T20:00:32Z

storage/operation/guarantees.go

+	))
+}
+
+type CollectionGuaranteeWithID struct {


Why creating this struct?

Because we would like to insert N guarantees with one functor, instead of N functors. And we need to bundle all the data before passing to the functor.

Why placing this struct here?

It might not be the best place, but at least it's close to where it is being used (InsertAndIndexGuarantees). I'm open for ideas for a better place for this struct.

Because we would like to insert N guarantees with one functor, instead of N functors.

Do we have any evidence that using N functors would have a non-negligible performance impact? I think it is likely to have an extremely small impact in practice, and don't think it is worth the complexity of optimizing for here.

Also, the need to pre-compute the guarantee IDs prior to passing to the functor is downstream of a lack of ID caching. General ID caching is a problem we will probably want to solve relatively soon, which would remove the need for this (we would just pass the Guarantee object around and call ID() as much as we want without per-call performance penalty). Again, unless there is evidence that we really need the optimization for this case, I suggest we opt for the simpler implementation.

zhangchiqing · 2025-10-09T20:04:24Z

storage/operation/approvals.go

-		return nil
-	}
+	errmsg := fmt.Sprintf("InsertAndIndexResultApproval failed with approvalID %v, chunkIndex %v, resultID %v",
+		approvalID, chunkIndex, resultID)


Regarding the err message, the functors like Overwriting and InsertingWIthMismatchCheck is too general that doesn't have the context. So I used WrapError to include more context.

zhangchiqing · 2025-10-09T20:05:09Z

storage/store/approvals.go

 	storing := operation.InsertAndIndexResultApproval(approval)

 	return func(lctx lockctx.Proof) error {
-		if !lctx.HoldsLock(storage.LockIndexResultApproval) {


This is redundant, because InsertAndIndexResultApproval will check the lock

zhangchiqing · 2025-10-09T20:05:27Z

storage/operation/writes.go


-// Upserting returns a functor, whose execution will append the given key-value-pair to the provided
-// storage writer (typically a pending batch of database writes).
-func Upserting(key []byte, val interface{}) func(storage.Writer) error {


Replaced by functors

zhangchiqing · 2025-10-09T20:09:30Z

storage/operation/guarantees.go

+	return WrapError("InsertAndIndexGuarantees failed", BindFunctors(
+		HoldingLock(storage.LockInsertBlock),
+		WrapError("insert guarantee failed", OverwritingMul(guaranteeIDKeys, guarantees)),
+		WrapError("index guarantee failed", InsertingMulWithMismatchCheck(collectionIDKeys, guaranteeIDs)),


Wrapping error for inserting multiple records is a bit challenging, I can't find a cleaner way to provide context for each individual write. So ended up just wrapping the process.

zhangchiqing · 2025-10-09T20:16:29Z

storage/operation/guarantees.go

+// Caller must ensure guaranteeID equals to guarantee.ID()
+// Caller must acquire the [storage.LockInsertBlock] lock
+// It returns [storage.ErrDataMismatch] if a different guarantee is already indexed for the collection
+func InsertAndIndexGuarantee(guaranteeID flow.Identifier, guarantee *flow.CollectionGuarantee) Functor {


This function is unused, but only for reference for now.

I initially created this function by combining InsertGuarantee and IndexGuarantee, and refactored with the functor pattern. But later I decided to use InsertAndIndexGuarantees, see comments there for my motivation.

I kept this one here for reference as it's easier to understand, will clean up after the discussion of functor pattern is settled.

storage/operation/functor.go

jordanschalm · 2025-10-28T19:59:20Z

storage/operation/functor.go

+// Overwriting returns a functor that overwrites a key-value pair in the storage.
+// The value is serialized using msgpack encoding. If the key already exists,
+// the value will be overwritten without any checks.
+//
+// This is typically used for operations where we want to update existing data
+// or where we don't care about potential conflicts.
+func Overwriting(key []byte, val interface{}) Functor {


Suggested change

// Overwriting returns a functor that overwrites a key-value pair in the storage.

// The value is serialized using msgpack encoding. If the key already exists,

// the value will be overwritten without any checks.

//

// This is typically used for operations where we want to update existing data

// or where we don't care about potential conflicts.

func Overwriting(key []byte, val interface{}) Functor {

// Upserting returns a functor that overwrites a key-value pair in the storage.

// The value is serialized using msgpack encoding. If the key already exists,

// the value will be overwritten without any checks.

//

// This is typically used for operations where we want to update existing data

// or where we don't care about potential conflicts.

func Upserting(key []byte, val interface{}) Functor {

I prefer the term "upsert": it matches the non-functor function with the same semantics, and better captures that we are either inserting or updating (we are only overwriting if something has already been written).

jordanschalm · 2025-10-28T23:33:23Z

storage/operation/functor.go

+//
+// This is used for operations where we want to ensure uniqueness and prevent
+// accidental overwrites of existing data.
+func InsertingWithExistenceCheck(key []byte, val interface{}) Functor {


I don't think separating this out from lock proof checking is a good idea. This function will not behave as expected if the user does not include lock checking separately. Currently there is not documentation of this critical requirement of the user of this function. We could add documentation, but I think it would be better to logically couple lock proof checking with corresponding read/write checks.

I like the idea, but I thinking adding too many layers of indirection will make it harder to write and verify correct use of locks. I think it would be better to couple each of these generic lock-requiring utility functions with HoldingLock.

There are lots of boilerplates, like the error to return, it allows us to work with higher abstraction.

We could also make a InsertingWithExistenceCheckAndLockCheck to bind with the lock checking, but IMO it's unnecessary.

So we are able to simplify the functor operation as:

func InsertAndIndexGuarantees(guaranteesWithID []*CollectionGuaranteeWithID) Functor { ... return BindFunctors( CheckHoldLockFunctor(storage.LockInsertBlock), UpsertMulFunctor(guaranteeIDKeys, guarantees), InsertFunctorMulWithMismatchCheck(collectionIDKeys, guaranteeIDs), ) }

We could also make a InsertingWithExistenceCheckAndLockCheck to bind with the lock checking

I don't think we should have a separate function that also binds the lock check. I think InsertingWithExistenceCheck should remain as it is, except that it also does the lock check. The reason is that InsertingWithExistenceCheck must never be called on its own, it always must be called strictly after a lock check.

Currently these two things which must always happen together are separate in the API, and the fact that they must be manually called together every time for correct usage is not documented. I'm suggesting that we modify the API so that the only way to call InsertingWithExistenceCheck is to simultaneously define the required lock check.

What about keeping both:

func InsertAndIndexGuarantees(guaranteesWithID []*CollectionGuaranteeWithID) Functor { ... return BindFunctors( + // keep this lock check here to make the functor composition flexible CheckHoldLockFunctor(storage.LockInsertBlock), UpsertMulFunctor(guaranteeIDKeys, guarantees), - InsertFunctorMulWithMismatchCheck(collectionIDKeys, guaranteeIDs), + InsertFunctorMulWithMismatchCheck( + storage.LockInsertBlock, // check if this lock is held before running the insert + collectionIDKeys, guaranteeIDs), ) }

jordanschalm · 2025-10-28T23:36:05Z

storage/operation/functor.go

+//
+// This is used for operations where we want to ensure data consistency and
+// detect potential race conditions or conflicting updates.
+func InsertingWithMismatchCheck(key []byte, val interface{}) Functor {


I have the same worry here about separating generic logic that must be lock-protected from the logic checking the lock, as mentioned above.

jordanschalm · 2025-10-28T23:57:52Z

storage/operation/guarantees.go

+	))
+}
+
+type CollectionGuaranteeWithID struct {


Because we would like to insert N guarantees with one functor, instead of N functors.

Do we have any evidence that using N functors would have a non-negligible performance impact? I think it is likely to have an extremely small impact in practice, and don't think it is worth the complexity of optimizing for here.

Also, the need to pre-compute the guarantee IDs prior to passing to the functor is downstream of a lack of ID caching. General ID caching is a problem we will probably want to solve relatively soon, which would remove the need for this (we would just pass the Guarantee object around and call ID() as much as we want without per-call performance penalty). Again, unless there is evidence that we really need the optimization for this case, I suggest we opt for the simpler implementation.

jordanschalm · 2025-10-29T00:02:55Z

storage/operation/payload.go

 	return UpsertByKey(w, MakePrefix(codeBlockIDToLatestSealID, blockID), sealID)
 }

+func IndexingLatestSealAtBlock(blockID flow.Identifier, sealID flow.Identifier) Functor {


General suggestion on naming

I think explicitly using the Functor terminology in the name of functor-returning storage functions would be clearer than using the present tense.

Present tense vs imperative doesn't really communicate "this will be done in the future" to me

The "functor" version is usually named very similarly to the "non-functor" version (eg. Index vs Indexing) -- I feel it would be easy to gloss over and miss the distinction

I'm suggestion as a naming convention: $(name you would have used had it not been a functor) + "Functor"

Suggested change

func IndexingLatestSealAtBlock(blockID flow.Identifier, sealID flow.Identifier) Functor {

func IndexLatestSealAtBlockFunctor(blockID flow.Identifier, sealID flow.Identifier) Functor {

Co-authored-by: Jordan Schalm <[email protected]>

github-actions · 2025-10-29T15:36:22Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

zhangchiqing added 4 commits October 9, 2025 10:17

add functors

27e8e1d

refactor approvals with functors

24d9b18

refactor with functors

66fa38e

add comment

0c01194

zhangchiqing added 3 commits October 9, 2025 10:54

add test cases

cb91ffd

implement store multiple collection guarantees

24c684e

add comments

31f3a04

zhangchiqing commented Oct 9, 2025

View reviewed changes

zhangchiqing requested review from AlexHentschel and jordanschalm October 10, 2025 15:42

zhangchiqing marked this pull request as ready for review October 10, 2025 15:42

zhangchiqing requested a review from a team as a code owner October 10, 2025 15:42

jordanschalm reviewed Oct 29, 2025

View reviewed changes

Apply suggestions from code review

762690a

Co-authored-by: Jordan Schalm <[email protected]>

zhangchiqing added 4 commits October 29, 2025 08:37

renaming

5928528

rename functor names

e7165f5

Merge branch 'master' into leo/refactor-functors

21066f5

rename to UpsertFunctor and CheckHoldsLockFunctor

96985db

	func IndexingLatestSealAtBlock(blockID flow.Identifier, sealID flow.Identifier) Functor {
	func IndexLatestSealAtBlockFunctor(blockID flow.Identifier, sealID flow.Identifier) Functor {

[Storage] Refactor storage operations with functors #8026

Are you sure you want to change the base?

[Storage] Refactor storage operations with functors #8026

Uh oh!

Conversation

zhangchiqing commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangchiqing Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangchiqing Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

General suggestion on naming

Uh oh!

github-actions bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhangchiqing commented Oct 9, 2025 •

edited

Loading

codecov-commenter commented Oct 9, 2025 •

edited

Loading

zhangchiqing Oct 9, 2025 •

edited

Loading

zhangchiqing Oct 30, 2025 •

edited

Loading

github-actions bot commented Oct 29, 2025 •

edited

Loading